What are we analyzing?
We aim to determine which correlation visualization methods are most
effective for small and large datasets.
We will use Heatmaps for a quick overview of
correlations and Scatter Matrix plots for a detailed
examination of relationships between variables.
What the code does:
• Imports the necessary libraries for working with data and
visualizations.
library(plotly)
library(ggplot2) # For diamonds
library(dplyr)
• Loads the mtcars dataset.
• Calculates the correlation matrix for all numeric variables.
data("mtcars")
small_data <- mtcars
small_corr <- round(cor(small_data), 2)
• Creates a Heatmap to visualize the correlations.
fig1 <- plot_ly(
data = small_data, # Data source
x = colnames(small_corr), # X-axis: variable names
y = colnames(small_corr), # Y-axis: variable names
z = small_corr, # Z-axis: correlation values
type = "heatmap", # Specify the type as heatmap
colorscale = "Viridis", # Color scale for the heatmap
text = round(small_corr, 2), # Text to display on hover
hoverinfo = "x+y+text" # Information to display on hover
) %>%
layout(
title = "Heatmap of Correlation (Small Dataset: mtcars)", # Title of the plot
xaxis = list(title = "Variables"), # X-axis title
yaxis = list(title = "Variables"), # Y-axis title
annotations = list(
x = rep(colnames(small_corr), each = nrow(small_corr)), # X positions for annotations
y = rep(colnames(small_corr), ncol(small_corr)), # Y positions for annotations
text = as.character(round(small_corr, 2)), # Text for annotations
showarrow = FALSE, # Hide arrows in annotations
font = list(size = 12, color = "white") # Font size and color for annotations
)
)
fig1
About the plot:
The Heatmap displays the correlations between numeric variables in the
mtcars dataset.
• Yellow color: strong positive correlations.
• Purple color: strong negative correlations.
This allows for a quick identification of the strongest and weakest
relationships.
Generates a Scatter Matrix for key variables mpg, hp, wt, and qsec in the mtcars dataset.
fig2 <- plot_ly(
data = small_data, # Data source
type = "splom", # Specify the type as scatter plot matrix
dimensions = list(
list(label = "mpg", values = ~mpg), # Define the first dimension
list(label = "hp", values = ~hp), # Define the second dimension
list(label = "wt", values = ~wt), # Define the third dimension
list(label = "qsec", values = ~qsec) # Define the fourth dimension
)
) %>%
layout(
title = "Scatter Matrix (Small Dataset: mtcars)" # Title of the plot
)
fig2
About the plot:
The Scatter Matrix visualizes pairwise relationships between key
variables in the mtcars dataset, along with their distributions. For
example, mpg shows a strong negative correlation with hp and wt.
• Samples 1,000 rows from the diamonds dataset.
• Computes the correlation matrix for numeric variables.
data("diamonds")
large_data <- diamonds %>% sample_n(1000)
large_corr <- large_data %>%
select_if(is.numeric) %>%
cor() %>%
round(2)
• Creates a Heatmap to visualize the correlations.
fig3 <- plot_ly(
x = colnames(large_corr), # X-axis: variable names
y = colnames(large_corr), # Y-axis: variable names
z = large_corr, # Z-axis: correlation values
type = "heatmap", # Specify the type as heatmap
colorscale = "Viridis", # Color scale for the heatmap
text = round(large_corr, 2), # Text to display on hover
hoverinfo = "x+y+text" # Information to display on hover
) %>%
layout(
title = "Heatmap of Correlation (Large Dataset: diamonds)", # Title of the plot
xaxis = list(title = "Variables"), # X-axis title
yaxis = list(title = "Variables"), # Y-axis title
annotations = list(
x = rep(colnames(large_corr), each = nrow(large_corr)), # X positions for annotations
y = rep(colnames(large_corr), ncol(large_corr)), # Y positions for annotations
text = as.character(round(large_corr, 2)), # Text for annotations
showarrow = FALSE, # Hide arrows in annotations
font = list(size = 12, color = "white") # Font size and color for annotations
)
)
fig3
About the plot:
The Heatmap shows correlations between numeric variables in the diamonds
subset. Strong positive correlations are visible between carat and
size-related variables (x, y, z), highlighted in yellow.
• Generates a Scatter Matrix for all numeric variables in the diamonds dataset sample.
numeric_data <- large_data[sapply(large_data, is.numeric)]
fig4 <- plot_ly(
data = numeric_data, # Data source
type = "splom", # Specify the type as scatter plot matrix
dimensions = lapply(names(numeric_data), function(col) {
list(label = col, values = numeric_data[[col]]) # Define each dimension dynamically
})
) %>%
layout(
title = "Scatter Matrix (Large Dataset: diamonds)", # Title of the plot
margin = list(b = 50) # Adjust bottom margin
)
fig4
About the plot:
The Scatter Matrix for the diamonds dataset sample shows pairwise
relationships between numeric variables. For instance, carat has a clear
positive linear relationship with x, y, and z.
Key Findings:
1. Heatmap:
• Effective for quickly assessing correlations in both small and large
datasets.
• Color gradients make it easy to identify the strongest and weakest
relationships.
2. Scatter Matrix:
• More informative for detailed pairwise analysis of variables.
• Suitable for small datasets or selected subsets of variables in large
datasets.